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Abstract. Subspace recycling methods, a class of Krylov subspace deflation techniques, have 
been shown to have the potential to accelerate convergence of Krylov subspace methods. In particu- 
lar, they can be useful when solving sequences of slowly-changing linear systems. We wish to extend 
such methods to solve sequences of linear systems, where for each system, we also solve a family 
of shifted systems in which the coefficient matrices only differ by multiples of the identity from a 
base system matrix. In this work, we demonstrate the difficulty of extending recycling techniques to 
solve multiple shifted systems while maintaining the fixed storage property. As an alternative, we 
introduce a scheme which constructs approximate corrections to the solutions of the shifted systems 
at each cycle while only minimizing the base system residual. At convergence of the base system 
solution, we apply the method recursively to the remaining unconverged systems. The method is 
robust enough to be applied to sequences of systems where the base system changes slowly and the 
shifts differ for each base system. We present numerical examples involving systems arising in lattice 
quantum chromodynamics. 
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1. Introduction. We consider the solution of a sequence of families of non- 
Hermitian linear systems. Let T denote a family of coefficient matrices differing by 
multiples of the identity. In other words, 

T = |a + i C C nxn , (1.1) 

where L is the number of matrices in the family, and we are solving the family of 
linear systems 

A + o-Wl) x<X> = b for £ = 1,...,L. (1.2) 



We call the numbers {cr^}. C C shifts, A the base matrix, and A + crl a 
shifted matrix. Systems of the form (|1.2p are called shifted linear systems. There are 
many applications which warrant the solution of a family of shifted linear systems 
with coefficient matrices belonging to T , such as those arising in lattice quantum 
chromodynamics (QCD) (see, e.g., [TS]) as well as other applications such as Tikhonov- 
Philips regularization, global methods of nonlinear analysis, and Newton trust region 
methods [5]. Krylov subspace methods have been proposed to simultaneously solve 
this family of systems [13l HH [30] . Our goal is to adapt subspace recycling technology 
to accelerate existing methods for solving sequences of shifted linear systems. In 
this paper, we describe the challenges in designing a method which extends subspace 
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recycling technology [33] to this setting. We then present a strategy which mitigates 
these difficulties. 

Let Ti denote the ith family of linear systems, defined by 

Ti = {A, + ^fl}^ C C nxn , 

where Li denotes the number of linear systems to be solved at step i. In other words, 
at step i, for shifts j we are solving systems of the form 

(A* + af 5 l) xf ) = hi for l = \...Li. 

This is the problem we target in this paper. However, for simplicity of discussion, we 
frequently will focus on one family of linear systems with only one shift and drop the 
index i, yielding two systems of the form 

Ax = b (1.3) 
(A + ffI)x w =b. (1.4) 

When dealing with only one family, in the absence of initial deflation subspace, tech- 
niques have been developed to solve the family of systems simultaneously, building 
recycled subspaces from harmonic Ritz vectors; see, e.g., [5]. 

In the next section, we review some existing methods for solving (jl.3|) and (|1.4|) . 
and we describe the framework of subspace recycling used in, e.g., [23] . In Section [3J 
we discuss the difficulties of adapting subspace recycling to shifted systems. We in- 
troduce a new algorithm (ideal but memory-intensive and computationally expensive) 
which combines ideas for solving (jl.3[) and (|1.4j) simultaneously while using subspace 
recycling techniques. This method is a direct extension of the one presented in [14] . 
In Section 0] we show that it is not possible to construct solutions for all shifted sys- 
tems over the same augmented subspace in a way that is compatible with restarting. 
We present a scheme in Section [5] which will produce improved approximations for 
the shifted system while solving the base system, using only one recycled subspace. 
This method is derived from the shifted GMRES method jT3], but it is not a direct 
extension. In Section[B] we present numerical results for a family of simple bidiagonal 
matrices and for some sequences of QCD matrices obtained from [T] and [20] . 

2. Preliminaries. In many Krylov subspace iterative methods, recall that we 
generate an orthonormal basis for 

/Cj(A,u) = span {u, Au, . . . , A J ' _1 u} (2.1) 

with the Arnoldi process, where u is some starting vector. Let Vj € C"*-? be the ma- 
trix with orthonormal columns generated by the Arnoldi process spanning /C,-(A, u). 
Then we have the Arnoldi relation 

AVj = V J+1 H, (2.2) 

with Hj 6 C<-j + V xj ; see, e.g., [26] Section 6.3] and [32]. Let xo be an initial ap- 
proximation to the solution of a linear system we wish to solve. A Krylov subspace 
method for solving a linear system generates successively larger Krylov subspaces at 
each iteration; and at iteration j, the method generates an approximation of the form 
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Xj = x +tj, where tj S /C,(A, r ) and r = b — Ax . We choose tj by requiring that 
the residual r 3 ■ = b Axj satisfies some constraint. For example, solutions produced 
by GCR [U and GMRES [27J satisfy y 3 _L A/C/(A,r ), which is equivalent to tj 
solving the minimization problem 

tj = argmin ||b - A(x + t)|| , 

t6/Cj(A,r ) 

and this is equivalent to solving the smaller minimization problem 



yj = argmin 

yea 



H jy -||r ||e^ +1) 



where we use the notation to denote the jth Cartesian basis vector in W\ and 
setting Xj = xq + Vjyj. 

For a Krylov subspace method, one can describe a restarted version of that 
method. The memory needed to store Vj for GMRES increases with j, and a restart 
is often used when the size of Vj exceeds the limit of available memory. Recall that in 
restarted GMRES, often called GMRES (to), we run an m-step cycle of the GMRES 
method and compute an approximation x m . We then discard all Arnoldi vectors and 
use x m as the initial approximation for the next cycle of GMRES. This process is 
repeated until we achieve convergence. Adaptions of restarted GMRES to solve (11.31) 
and (|1.4j) have been previously proposed; see, e.g., [14] . 

It should be noted that methods based on the nonsymmetric Lanczos process have 
also been adapted for solving multiple shifted systems. Extensions of methods, such 
as BiCGStab [13] and QMR, have been developed [15]. A recently proposed method 
called IDR [35], which has been shown to be a generalization of BiCGStab [33], has 
also been extended to solve (|1.3I) and (|1.4p [19] . We will not deal with nonsymmetric 
Lanczos-based methods in this paper, but these alternatives are worth mentioning. 

Many methods for solving (jl.3|) and (ll.4[) use the fact that for any shift a, the 
Krylov subspace generated by A and b is invariant under the shift, i.e., 

/Cj(A,b) =£j(A + o-I,b), 

as long as the starting vectors are collinear, with a shifted Arnoldi relation similar 
to 

(A + <rI)Vj =V J+1 Hf ) . (2.3) 

Note that the shift-invariance no longer holds if general preconditioning is used. How- 
ever, polynomial preconditioning [16] would be appropriate in this setting. There has 
been recent work on choosing optimal polynomial preconditioners in the setting of 
solving multiple shifted systems |2 . In this project, though, we focus on the unpre- 
conditioned case, as in [14] and [22] . 

The shift-invariance property indicates that large savings in storage and time 
can be achieved by generating only one sequence of Krylov subspaces and solving 
all shifted systems in one Krylov subspace simultaneously. Suppose that the initial 
residuals of f|l .3|) and (|1.4|) are collinear. For non-restarted Krylov subspace meth- 
ods derived by applying a Petrov-Galerkin condition to the residual, it is straight- 
forward to formulate a shifted version of that method. We simply apply the same 
Petrov-Galerkin condition for all residuals. However, once restarting is introduced, 
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the situation becomes more complicated. If we project the residuals such that they 
are no longer collinear, at restart the Krylov subspace for the base system and that 
of the shifted systems are no longer identical. This loss of collinearity does not occur 
for methods such as restarted Full Orthogonalization Method (FOM) [30] . FOM is a 
method closely related to GMRES, and the residuals produced by FOM are all inher- 
ently collinear with the newest Arnoldi vector. In |13j . a general theorem is presented 
which describes conditions under which the residuals will be naturally collinear in 
this manner. However, as discussed by Frommer and Glassner in [TJ], the GMRES 
residual projection does not have this property. We describe the algorithm proposed 
in |14j in more detail. 

Frommer and Glassner [14] proposed a restarted GMRES method to solve (jl.3|) 
and (|1.4[) . Suppose that for the base system and the shifted systems we have the 
same initial approximation xq = Xq CT ^ = 0, meaning that the residuals for the base 



and shifted systems are the same, i.e., ro = r, 



b. A restarted GMRES method 



to solve (|1.3[) and (|1.4p will require that we enforce collinearity upon the residuals at 
the end of each cycle rather than simply minimizing all residuals. 

Suppose that our residuals for the shifted and base system satisfies the relation 



» 



Po*o 



(2.4) 



at the beginning of a cycle. For the base system, we can represent the GMRES 
minimized residual at the mth step as r m = r m (A)ro where the residual polynomial 
r m {t) is a polynomial of degree less than or equal to m such that r TO (0) = 1. We enforce 
the condition that the residual for the shifted system is parallel to the minimized 
residual of the base system, i.e., 



Pn 



(2.5) 



By straightforward computations in [14) . it is shown that if z m+ i 
then for (12.51) to hold, we must have 



= ||r || ei -H TO y„ 



H m yW + z m +iPm = Po ||r || 



(2.6) 



and x™ = Xq CT) + V m y„ ' , where 



H 



(<0 



H„ 



al r , 




lxro 



Thus, we can compute both and /3 m by solving the augmented linear system, 

»1 



"m z m+l 



yh 

Pn 



= /3 ||r ||e^ +1 >. 



(2.7) 



Note that we only compute the minimum residual solution using a Petrov-Galcrkin 
condition for the base system. We enforce collinearity of the residuals to compute the 
approximation for the shifted system. We can equivalently think of this as apply- 
ing an oblique rather than an orthogonal projection to the residual of the shifted 
system on A/C m (A, ro). It is shown in p~4j Lemmas 2.1 and 2.4] that a solution 
to (|2.7[) exists if and only if the residual polynomial r m (t) satisfies r m (— a) ^ 0; 
otherwise, the augmented system is singular. If we compute the QR-factorization, 
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Fig. 2.1. The simultaneous solution of a base system and four shifted systems using the shifted 
GMRES method. The base system A is constructed from the matrix conf5. O-OOI4.X4.-IOOO from 
Matrix Market f7f, which we call D. We form A = (1/k c + .001)1 — D for a value k c provided with 
the matrix. The matrix A is real-positive. GMRES(30) is used to solve the base system, and the 
shifted GMRES method produces approximations for the four shifted systems. The shifts are listed 
in the convergence plot. 



z m+1 = Qm+iR-m+ii then this condition can easily be verified by examining 
the last diagonal entry of R m +i to detect numerical singularity. In the case that the 
solution does not exist, we simply perform one more iteration and check this condi- 
tion again. It is pointed out in |14j . though, that for a positive-real matrix A (field 
of values being contained in the right half-plane) , restarted GMRES for shifted linear 
systems computes solutions at every iteration for all shifts a^' > and, in addition, 

we have ||r m || < ' for such shifts. The shifts applied in the setting of QCD yield 

a family of coefficient matrices which are, in theory, real-positive [14] . In Figure 12.11 
we demonstrate the convergence of the shifted GMRES method for simultaneously 
solving a family of five shifted linear systems. 

This is an efficient method for solving families of shifted linear systems, but it is 
based on restarted GMRES, meaning the method inherits the properties of stagnation 
and unpredictable convergence exhibited by restarted GMRES, see, e.g., [12 ] IT7 ] 129]. 
An attractive idea would be to combine Frommer and Glassner's method with a 
subspace augmentation or deflation technology. Morgan's GMRES-DR [22] is one 
candidate, and in [5], this method has been extended to simultaneously solve a fam- 
ily of shifted systems. However, it is restricted to the use of approximate invariant 
subspaces spanned by harmonic Ritz vectors. We cannot select other subspaces using 
different criteria. As a result, we cannot use the recycled subspace from one family 
of shifted linear systems to deflate the next one. In a situation in which we are run- 
ning multiple QCD simulations, we would like to accelerate convergence by deflating 
between different base matrices. What we seek is a deflation framework in which we 
can select approximations from an augmented Krylov subspace and enforce a residual 
collinearity condition. 

For A symmetric, this would be achievable. For symmetric positive definite ma- 
trices, the deflated conjugate gradient method [H] [28] has been shown to be effective 
and has the benefit of producing a residual at step m which is collinear with the 
newest Lanczos vector. In [25], the authors use the positive definiteness of the co- 
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efficient matrix A to construct an A-orthogonal deflation projector, and it is this 
A-orthogonality property that is used to prove collinearity. In the case that the 
matrix A is nonsymmetric, residual collinearity is not guaranteed. When extending 
this method to shifted, non-Hermitian systems (using the restarted FOM method of 
Simoncini [30), we would not have the benefit of inherent collinearity. 

Furthermore, in the case of symmetric, indefinite systems, Kilmer and de Sturler 
[18] extended the recycled MINRES method [41 to solve multiple shifted linear sys- 
tems (in this case with complex shifts). Their strategy relies on the fact that in the 
case of symmetric systems, there is no need to restart. Thus, their procedure produces 
approximations with non-collinear residuals. Such a procedure is not possible in the 
case of non-symmetric systems, as it would destroy the collinearity needed at restart. 

For nonsymmetric matrices, recycling- type techniques have been previously de- 
scribed. The GCRO method [37J allows the user to select the optimal correction over 
arbitrary subspaces. de Sturler [35J extends this concept by providing a framework 
for selecting the optimal subspace to retain from one cycle to the next so as to mini- 
mize the error produced by discarding information. This algorithm is called GCROT, 
where OT stands for optimal truncation. A simplified version of the GCROT ap- 
proach, based on restarted GMRES (called LGMRES) is presented in [3j. Parks et al. 
in [53] combine the ideas of [23] and [35J and extend them to a sequence of slowly- 
changing linear systems. They call their method GCRO-DR (Recycled GMRES). 

We briefly review this method, as described in |23j . In order to simplify notation, 
we drop the subscript i and discuss subspace recycling from the point of view of a 
single base system. Thus, we are simply solving (jl.3|) . Let us assume that we possess 
a matrix U G c nxfe j whose columns span a deflation subspace. We begin by obtaining 
an ortho normal basis for the subspace spanned by the columns of AU G C nxfe using 
the QR-factorization AU = CS and assigning U = US^ 1 . Now we have 

U G C nxk and C = AU such that C*C = I. (2.8) 

We run m — k steps of the Arnoldi process, but for the projected operator 
(I — CC*)A. This will allow us to construct a solution update x m = xo + t m such 
that t m G S, where S = 7£(U) + /C m _/-((I — CC*)A, ro). For this to work, we require 
that we begin with an approximation Xo such that our initial residual ro is orthogonal 
to 1Z(C). Thus, for an initial approximation xo and initial residual ?o = b — ASo, we 
must begin by orthogonally projecting the residual ro onto 1Z(C) and then updating 
xo, 

r = r - CC*7 and x = X + UC*? - (2.9) 

Let us now generate the Krylov subspace /C m _fc((I — CC*)A, ro) of dimension m — k. 
Let V m _fc G (£nx(m— fc) nave columns which form an orthonormal basis of 
/C m _fe((I — CC*)A,ro), where we have the usual Arnoldi relation 

(I - CC*)AV m _ fc - V m _ fc+1 H m _ & , i.e., 

AV m -fe = CC*AV m _fc + V m _fc+iH m _fc (2-10) 

with H m _fc G c( m - fe + 1 ) x (™-' c ) being the usual upper Hessenberg matrix. We can 
rewrite (|2.8[) together with (|2 . 10[) as the Arnoldi-like relation, 

A [U V m _ fe ] = [C V m _ fc+1 ] 



I 





B 



(2.11) 
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m-fe+ll 



where B — C*AV m _ fe . If we put 

v m = [u v m _ fe ] , w m+1 = [c 

then we can rewrite (|2 . 1 1 1) as 

AV m = W m+1 G 

At the end of the cycle, we can solve 



and G™ = 



Ik _B 

H m _fc 



(2.12) 



(2.13) 



x = argmm 

xex +TC(V„ 



lb- Axil 



by solving the small (m + 1) X m least-squares problem 



argmm 

yec™ 



I II (m+1) t=t 

kolK, +1 -G m y 



and setting 



x = x +V m y m and r = r - W m+ iG m y„ 



(2.14) 



(2.15) 



However, we do not actually need to solve this least squares problem. We can solve 
a smaller problem. We can decouple (|2.14p . and instead solve a small GMRES-likc 
minimization problem. Let 



yfcxi 

y(m-fc)xl 



(2.16) 



where y fcx i e C k and y( m _fc) x i G 



y(m-fc)xl 



argmm 

ygCm -fc 



k . We can then solve 

H TO _fey 



II (m-fc+l) 



and compute y/cxi = — By( TO _/-) x i. This is a direct consequence of the upper tri- 
angular structure of G m , and was already used in the proof of [37[ Theorem 2.2]. 
Furthermore, by this same theorem, the minimum residual norm 



I || (m—k+l) 



i-fcy(m-fe)xl 



is also the residual norm for the approximation computed over the augmented space, 
thus yielding a cheap residual norm computation. This last fact can also be seen as a 
consequence of the requirement that ro _L 1Z(C). Since we require that 

r m J_ K(C) 8 IC m - k+1 ((I - CC*)A, r ), 

and ro is already orthogonal to 1Z(C), the Recycled GMRES residual update reduces 
to an update from the projected Krylov subspace. In other words, in the residual 
update, there is no component from 1Z(C), and we can rewrite the update (|2.15j) 

x = x + Uy fcx i + V m _fey( m _fe)xi and r = r o - V m _ fe+ iH m _ fc y (m _ fc)xl . (2.17) 

At the end of the cycle, we construct a new recycled subspace; and if we have 
not already converged, we begin the next cycle. If we have converged, we save this 
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Fig. 2.2. The convergence curves of GMRES (solid black curve), GMRES(20) (solid grey 
curve), and GCR0DR(16,2) (dashed black curve). Observe that the latter two methods have the 
same storage requirements. The coefficient matrix and right-hand side are from the sherman5 system 
from Matrix Market \T$. We preconditioned with ILU(O). We see that for this problem, it suffices 
to retain a two-dimensional subspace between restart cycles to achieve convergence almost as good 
as full GMRES. 

constructed deflation subspace to use when solving the next linear system. In Fig- 
ure 12. 2\ we demonstrate the acceleration of convergence of Recycled GMRES with a 
two-dimensional recycled subspace, as compared to GMRES and GMRES (20). For 
the recycled subspace, we computed the two harmonic Ritz vectors associated to the 
harmonic Ritz values of smallest magnitude. 

Convergence results for augmented Krylov subspace methods have been previ- 
ously presented, see, e.g., [10j[25], but not much work has been done in the context of 
Recycled GMRES. de Sturler has presented some not- yet-published work that specifi- 
cally addresses the convergence behavior of optimal methods in which we recycle using 
the above framework |39|. This work asserts that the improvement of convergence 
bounds from recycling a particular subspace can be quantified according to the quality 
of the recycled subspace as an invariant subspace. A particular finding, backed up by 
empirical observation, is that an approximate invariant subspace of modest quality (as 
judged by the largest principle angle between U and C) will still yield improvements 
in bounds on the residual norm. 

It should be noted that for a single system, if we have no initial recycled space and 
compute harmonic Ritz vectors at each restart, GMRES with recycling is algebraically 
equivalent to Morgan's GMRES-DR [22]. Iterating orthogonally to an approximate 
invariant subspace to accelerate convergence of GMRES can be justified by the the- 
oretical work in [31 j - It was shown that the widely observed two-stage convergence 
behavior of GMRES, which has been termed superlinear convergence, is governed 
by how well the Krylov subspace approximates a certain eigenspace. Specifically, 
when the Krylov subspace contains a good approximation to the eigenspace (call 
this eigenspace iS) associated to eigenvalues hindering convergence, we will switch 
from the slow phase to the fast phase, and convergence will mimic that of GMRES 
on the projected operator Vg A where Vg is the orthogonal projector onto the or- 
thogonal complement of S. This analysis complements previous discussions of this 
phenomenon, see e.g., [4j [40] . 
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3. Recycled GMRES For Shifted Systems. Subspace recycling has shown 
great potential to improve the convergence of restarted methods, in many cases, 
without dramatically increasing the memory costs. Therefore, if we can incorporate 
GMRES for shifted linear systems into the recycling framework described in [23], we 
will have a storage-efficient method which will solve all shifted systems simultaneously 
but is less likely to suffer from problems which plague restarted GMRES. We denote 
this method Recycled GMRES for shifted systems. 

In terms of extending the work of Frommer and Glassner |14) . we present an 
ideal method, in which we construct multiple deflation spaces, one for each shift. 
However, such a method is memory-intensive. Furthermore, constructing the spaces 
is computationally expensive. It may be useful when there are only a small number of 
shifts and when our deflation space is not too large. Unfortunately, without multiple 
deflation spaces, a method which enforces the collinearity condition on shifted systems 
over an augmented Krylov subspace based on Recycled GMRES cannot exist using 
this framework, as we will show. 

Consider the pair of linear systems (|1.3j) and (jl.4|) and suppose, as in Section [2j 
that we have U and C, satisfying (|2.8j) . For initial approximations, we choose xo and 
Xg CT ' ) so that the initial residuals are collinear, i.e., fo = PqT^. In Recycled GMRES, 
the initial projection of the residual (|2.9p and resulting update of the solution rely on 
the relation (I2.8[) between U and C. This means that even though we can orthogonally 
project onto 1Z(C)^, we cannot easily update the approximation associated with 
the shifted system. In order to project the residual for the shifted system, we would 
need U (ct) such that 

C = AU = (A + o-I)U (<t) . (3.1) 

This would require an additional k vectors of storage for each shift, and can only 
be obtained with significant work. In the Appendix, we show that for a certain choice 
of U, we can obtain without solving (13. ip . 

Suppose that for a given C G c™ xfc with orthonormal columns, we already have 
matrices U and XJ^ satisfying the relation (|3.1j) . We make an observation on the 
relationship between the subspaces /C m ((I— CC*)A, v) and £ m ((I— CC*)(A+oT), v), 
for any vector v orthogonal to the range of C. 

Proposition 3.1. Let C be a matrix with orthonormal columns. Then v is in 
the orthogonal complement of the range of C, i.e., CC*v = if and only if 

/C TO ((I - CC*)A, v) = £ m ((I - CC*)(A + crl), v) for all m G N (3.2) 

Proof. First, suppose v _L 1Z(C). Since CC*v — 0, we have 

(I - CC*)(A + crl)v = (I - CC*)Av + cr(I - CC*)v = (I - CC*)Av + crv. 

Therefore, when restricted to vectors orthogonal to the range of C, we have that 

(I - CC*)(A + crl) = (I - CC*)A + crl. 

Furthermore, since any u G 7l((I — CC*)(A + crl)) is orthogonal to the range of C, 
we have 

[(I - CC*)(A + al)] j v = [(I - CC*)A + al] j v 
when applied to any v _L 1Z(C). Thus, 

/C m ((I - CC*)(A + crl), v) = /C m ((I - CC*)A + crl, v) = /C m ((I - CC*)A, v), 
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where the last equality follows from the shift invariance property of Krylov subspaces. 

Now, suppose (|3.2p holds, and let m = 2. Since (|3.2[) holds for any vector in the 
subspace, it holds for u G £2 ((I - CC*)A, v) \ Kx((I — CC*)A, v), i.e., we have 

u = aiv + a 2 (I - CC*)Av = ftv + /3 2 (I - CC*)Av + (3 2 a(I - CC*)v 

where a 2 and j3 2 are nonzero. This implies 

(a 2 - - CC*)Av - *3 2 a(I - CC> = (ft - Ql )v, 

and thus, v _L K(C). □ 

Thus, suppose we construct an augmented subspace 7\L(U)+/C m _fc((I— CC*)A, ro) 
for the base system and minimize the residual as described in |23j . From the aug- 
mented subspace 7^(11^)) + /C m _fc((I — CC*)A,r ), we can construct an approxi- 
mation for the shifted system by enforcing a collinearity condition on the residual 
of the shifted system. Thus, we can simultaneously solve both the base and shifted 
systems over augmented Krylov subspaces. Recall that in the presence of no deflation 
subspace, we have the two Arnoldi relations 



AV m = V. m+ iH m and (A + aI)V m = V m+ iH 
respectively for the base and shifted systems, where = H„ 



{<?) 







. From here, 



it is straightforward to construct an Arnoldi-like relation for the shifted system in the 
framework of subspace recycling, 



(A + (7I)[UW V m _ fc ] = [C V ro _ fc+1 ] 



H 



B 

tt(°0 



(3.3) 



We note that the same matrix B appears in (|3.3p as in (|2.11l) since 

(I - CC*)(A + aI)V m _ fc = V m _ fe+1 H^ fe 
(A + <rI)V m _ fe - CC* AV m _ fc - aCC*V m - k = V ro _ fe+l H^ fc 

(A + <rI)V m _ fc = CB + V m _ fe+1 H^ 



k ■ 



(3.4) 



where B = C*AV m _fc and where we used the fact that crCC*V. m _fc = 0. This is 
due to the fact that the columns of Vm-k are orthogonal to 1Z(C). We can follow 
the same arguments as in |14j and construct an approximation for the shifted system 
such that the residual is collinear to the one produced by Recycled GMRES for the 
base system, 



b-(A + aI)(x^ ) 



» 



[UM V m _ fe ]y 



[C V m _fe + iJ 
/3 r - [C V m _ fe+1 ] 



Ik 


B 





w (-) 

n m-fe 


I/, 


B 





m—k 


Ik 


B 





n m-k 



M) = 



M) = 



Pm I'm 

Pm [C V m _fc + i] Z r , 

Pm [C V m _fc + i] Z„ 

Pm [C V m _fc + i] Z r , 
Pm^m j 



(3.5) 
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which yields the linear system 



O) 



where 



We then update 



G 



(<0 







A)||r | 



B 



(3.6) 



r(«0 



(a) 



Observe that the approximation with collinear residual is drawn from a different 
augmented Krylov subspace than the minimum residual approximation of the base 
system, i.e., for rather than U. As it can be appreciated, this is a straightforward 
extension of the work of Frommer and Glassner [14] . 

4. Non-existence of Collinear Residuals. The algorithm presented in Sec- 
tion [3] is the most natural extension of the recycling framework to GMRES for shifted 
systems, but in terms of storage, it is not efficient since storage requirements increase 
for each new shift. Therefore, to develop an algorithm with fixed-storage requirements 
regardless of the number of shifts, we propose to only store the matrix U £ C nxk 
whose columns span the recycled subspace associated to the base system. Can we 
still extend the mechanics of shifted GMRES to this setting? The answer, in general, 
is no. 

The simple appearance of the shift in the Arnoldi relation, as in (|2.3j) . no longer 
holds in the Arnoldi-like relation (|2.13|) from GMRES with recycling. Observe that 
for V TO and W m+ i, defined as in (|2.12l) . 



(A + ctI) V m = W m+ i 



Ife 




B 



If we have 



Range(V m ) C Range(W m+ i) 



(4.1) 



the relation could be easily modified so that a relation similar to (|2.3p holds, allowing 
the collinearity condition to be enforced. However, this inclusion, in general, does 
not hold; the columns of U span an approximate invariant subspace of A, not a true 
invariant subspace. In general, the span of the columns of U will be different than 
that of C. The same is true of V m and W m +i, respectively. Similar observations are 
made in the context of Hermitian systems in [18] . 

There is one scenario in which (|4.ip does hold. Consider the situation in which we 
begin with no starting recycled space and compute harmonic Ritz vectors at the end 
of each cycle to pass into the next cycle. We run an m-step cycle of shifted GMRES, 
and at the end of that cycle, let the columns of U be k harmonic Ritz vectors, we 
compute C as before, and restart. Morgan [21] showed that for a harmonic Ritz pair 
(g, 6), the eigenvector residual Ag — 9g is a multiple of the GMRES residual r m . 
At the end of a cycle, if we compute k harmonic Ritz vectors and store them as the 
columns of U, then we know that 



Range(AU — UD) = span(r r , 



(4.2) 
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where D = diag(#i, . . . ,9k), the diagonal matrix containing the respective harmonic 
Ritz values associated to the columns of U. If we compute the QR-factorization of 
AU = CR and let U = UR" 1 , then for T = RDR 1 we have 

Range(C - UT) = span(r m ). 

At the beginning of the next cycle, we take 

vi=r m /||r m || (4.3) 

as the first Krylov vector; and in this case, the containment (I4.1j) holds. This is 
the same fact exploited in [8], where the authors observe that the augmented Krylov 
subspace is itself actually a larger Krylov subspace with a different starting vector. 
Thus, the shifted GMRES method can be applied directly to the Krylov subspace 
augmented with the harmonic Ritz vectors, as long as there was no deflation space at 
the beginning of the process. 

What about in the general setting? Let E be a matrix whose columns form a 
basis for the orthogonal complement of 1Z(C) © 1Z(\ m -k+i) in W 1 . We note that E 
needs not be computed; we use it here as a theoretical tool to develop our algorithm. 
We can write 



U = CY + V m _ fe+1 Z + EF, 



(4.4) 



where Y e C fexfe , Z e c( m - fc+1 ) xfe , and F e C^-™" 1 )** 5 . This yields the following 
imperfect Arnoldi-like relation for the shifted system, 



(A + al) [U V ro _ fe ] = [C V TO _ fe+ i] 
If we let 



B 



a [EF 0] . (4.5) 



I fe + (tY 



B 



(4.6) 



together with (|2.13l) . then the Arnoldi-like relation (|4.5p can be rewritten as 
(A + ri)V ra = W m+1 GW + a [EF 0] . 

At the end of the cycle, we minimize the residual of the base system accordingly, 
obtaining y m as in (|2 . 16[) . so that 



r m = ro - W m+ iG m y. 
||ro|| W m+i e 
W„ l+ iz m+ i, 



= ||ro|| Wm+ie^J' -W m+ iG m y„ 



where 



It- II c.( m+1 ) 

l r oll e fe+l 



G m y„ 



(4.7) 
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Now, for the shifted system, we would like to enforce the collinearity condition. If a 
collinear residual were to exist for the shifted system, then it would satisfy 

b - (A + <7l)(x^ + V m y^) = /3 m W m+1 z m+1 

r { a) - (A + rf)V m y£> = W m+1 z m+1 /3 m 

f3 r - (W m+1 G$ + a [EF 0] )y£) = W m+1 z m+1 /3 m 

/3 r = W m+1 (z m+lj 8 m + G^yt') 

+a[EF 0]yW (4.8) 

However, the shifted approximation yielding collinear residual exists only when 
a [EF 0] = 0. This is not generally satisfied when (|4.1[) is false. Observe that 
in the general case, ro G 7£(C) © 1Z(\ m -fc+i) while the right-hand side of (|4.8j) has a 
non-zero component in 1Z(E) = (7£(C)©7£(V m _ft+i)) . Thus, we state the conditions 
for existence (and nonexistence) of the collinear residual in the following theorem. 

Theorem 4.1. Let U and C be defined as in (|2.8[) . Suppose we have approx- 
imations Xo and x.^ to the solutions of (jl.3l) and (|1.4[) , respectively, such that the 
residuals Tq and are collinear, i.e., r = /3oi"o, and ro _L 1Z(C). Let r rn be the 
minimum residual solution produced by Recycled GMRES over the augmented Krylov 
subspace 7£(U) + /C m _fc((I — CC*)A, ro). Then one of the following is true: 
. K(\J) + /C m _ fc ((I - CC*)A, r ) C TZ(C) © /C m _ fc+1 ((I - CC*)A, r ) 
• There exists no approximation xf G 7£(U)+/C m _fc((I — CC*)A,ro) (|1.4p 
such that Tm is collinear to r m , i.e., ^ f3 m r m , for all f3 m G C. 

5. An Approximate Collinearity Scheme. In general, Theorem l4.1l states we 
do not have collinear residuals. What is the best we can do? Before proceeding with 
the mathematical details, it will be helpful to provide an overview of the strategy we 
are proposing and encode this into a schematic algorithm. This algorithm will allow 
us to get a good approximation for the shifted system at low cost. As before, the base 
system will be solved using cycles of GMRES with recycling. For the shifted system, 
though we can initially project the residual away from 1Z(C), we cannot perform the 
corresponding update of the solution. Instead, at Line [S] of Algorithm [STTJ we perform 
an update of the shifted system approximation which implicitly updates the residual 
by the perturbation of an orthogonal projection. We have 

xM=^+UC*?W. 

The updated residual can be written as 

r CT »=b-(A + aI)x ^ 

= b-(A + < rI)(x CT) +UC*? <T) ) 
= ? CT) - (A + d)UC*?W 

= r { a) - CC*r { Q a) - aUC*?o' T) (5-1) 

true orthogonal projection perturbation 



We also cannot enforce a residual collinearity condition at the end of each cycle. 
However, neglecting a term from the collinearity equation allows us to solve a nearby 
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approximate collinearity condition directly. We use the solution from this nearby 
equation to update the approximation for the shifted system. These corrections to 
the shifted system solution improve the residual but do not lead to convergence for the 
shifted system which will start with an expected improved approximation. We present 
analysis showing how much improvement is possible with this method. We terminate 
the iteration when the base system has converged. The algorithm can be applied 
recursively on the remaining unconverged shifted systems. This recursive method of 
solving one seed system at a time while choosing corrections for the approximations 
for the other systems has been previously suggested in the context of linear systems 
with multiple right-hand sides, see e.g., [6l 134]. 

We present an outline of this process in Algorithm 15.11 Note, in the presenta- 
tion of the algorithm, we solve multiple shifted systems in order to demonstrate the 
algorithm's amenability to recursion. We describe Algorithm 15. II in more detail. 

As we previously stated, EF is generally non-zero, meaning the collinear residual 



Algorithm 5.1: Schematic of Shifted Recycled GMRES with an Approximate 
Collinearity Condition 

Input : A 6 C" xn ; {cr {f) }^ =1 C C; U, C e C nxfe such that AU = C and 

C*C = I/.; Initial Approximations x and Xq CT ' such that residuals 
are collinear; e > 

1 x^xo,r = b Ax 

2 x «- x + UC*r, r <- r — CC*r; Project base residual 

3 xC- W ) <- r(^>) = b - Ax(" ( ") for all I 

4 for I = 1 to L do 
x (<T<<!> ) <- x^^) H-UC'r^"'; Update shifted approximation, but not 
an implicit residual projection 

6 while llr II > e do 



7 

8 

9 
10 
11 

12 

13 



Construct a basis of the subspace K. m ((I — CC*)A, r) 

Compute update t 6 Range(U) -I- /C m ((I — CC*)A,r) by minimizing 

residual using Recycled GMRES 

x^x + t; r^b — Ax 

for I = 1 to L do 

Compute update ') e Range(U) + JC m ((I - CC*)A,r) according 
to the approximate collinearity condition 
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Compute updated recycled subspace information U and C 

14 Clear any variables no longer needed 

15 if L > 2 then 
Make a recursive call to Algorithm 15.11 with A <s— A + cr^I, shifts 

} £=2 , approximations |x CT< ' | and updated recycled 
subspace matrix U 
17 else 

is Apply Recycled GMRES to L ) yielding shifted system approximation 
_ that achieves tolerance e 
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does not exist. However, if it is relatively small (e.g., in the sense of the 2-norm) 
perhaps we can disregard it. This yields an augmented linear system which can be 
solved directly, 

VA + GL'% ] =/3 ||r||ei™+ 1) , or (5.2) 



Thus, we proceed by solving this nearby problem and updating the shifted solution, 

X W=xW+V m yW. (5.3) 

For each restart cycle, we repeat this process for the shifted system. We stop when 
the base residual norm is below tolerance. As experiments show in Section [BJ the 
update (|5.3[) at each cycle improves the solution for the shifted system, at very little 
additional cost. When the residual norm for the base system reaches tolerance, the 
residual norm of the shifted system will have been reduced but, generally, not enough 
to satisfy the given tolerance. We then apply the GMRES with recycling algorithm 
with this approximate collinearity scheme to the remaining unsolved systems, taking 
one of the shifted systems as our new base system. Thus, this method is amenable to 
recursion on the number of shifts. If we have only one system remaining to solve, we 
simply use Recycled GMRES. 

Observe that for any number of shifts, we can easily form Gm' for each a at 
little additional cost. The matrices Y and Z in (I4.4[) must be computed only once 
per iteration, regardless of the number of shifted systems we are solving. However, 
additional shifts will require more recursive calls to the algorithm and, thus, more 
iterations. 

5.1. Analysis of the Approximate Collinearity Condition. We provide a 
simple analysis which justifies why the approximate collinearity condition can produce 
improved approximations to the solutions of the shifted systems and to understand 
how well we can expect the algorithm to perform. This analysis also yields a cheap 
way in which we can monitor the progress of the residuals of the shifted systems. 

Theorem 5.1. Suppose we begin the cycle as in (|4.8[) , with approximate collinear- 
ity between the base and shifted residuals, satisfying the relation 

r^to + sW. (5.4) 

Then if we perform a cycle of Recycled GMRES to reduce the residual of the base 
system and apply the approximate collinearity condition (|5.2[) . we have the relation 

r£ = PrnVm - <xEF (yW) + (5.5) 

V / l:fc 



(m+1) 



= A||r||e^ 
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Proof. We can write the residual produced by the approximate collinearity procedure 
for the shifted system, 



b- (A + ( tI)(x^ ) 



?W = b-(A + rf)xM 

r<*>-(A + rf)V m y# 
A,r + s<*> - (A + <7l)V m y ^ 
/3 r - (W ro+1 GW + a [EF 0] ) y# + 
A, ||r || W m+1 e^ - W m+1 G^yi? - a [EF 0] y&> 



(5.6) 



= p ||r || W^ie^^ - W m+ iG^y^) - m W m+1 z m+1 
-it [EF 0]y(„ ff) +sW 

= W m+1 (/3 ||r || egg* - G&>y&> - /3 m z„ i+1 

—a [EF 0]yW+sW. 



/3mW m+ iZ TO+ i 
/3mW m+ iZ m+ i 



Now using the approximate collinearity condition (|5.2[) and the fact that by definition 
Zm+i, we have that 



s(<0 _ 



/3 m r m -a[EF 0]yW + gW 



(5.7) 



Then we can write 



PrnXri 



aEF (yW)^ 



It should be noted that the term — crEF (y^P) + is a function of the 

quality of the recycled subspaces as well as of a. With the use of simple inequalities, 
we obtain an important corollary describing the amount of residual norm reduction 
we can expect for the shifted systems. 

Corollary 5.2. Under the same assumptions as in Theorem 15.11 if we apply 
the approximate collinearity condition (|5.2p to obtain a correction Xm' 1 for the shifted 
system, then the associated residual satisfies the following inequality, 



< 



A, 



H ||EF||||(yW) w 



This gives us important information about the performance of this method with 
regard to the shifted system. The norm of fm is dominated by a scaled multiple of the 



norm of r m as long as 



\v m \\ is larger compared to \a\ ||EF|| (y^M 



,(<0| 



As long as this is the case, we can expect a decrease in the norm of r m to yield a 
decrease in the norm of fm' Furthermore, we see that the amount of improvement 
we can expect from computing corrections for the shifted system according to (|5.2|l 



is affected by |er|, ||EF||, and 



We cannot control 



, and cr is 



dictated by the problem. However, the size of ||EF|| is connected to the quality of U 
as an approximation to an invariant subspace of A. This can seen by writing 



EF = U-(CY + V m _ fe+1 Z) 



(5.8) 
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and observing that the norm of this difference decreases as U becomes a better approx- 
imation of an invariant subspace of A. This is because of the way we defined Y and 
Z in (|4.4I) . which gives an orthogonal decomposition of U with components in 11(C) 
and /C m _fe+i((I — CC*)A,ro). As U approaches an invariant subspace, ||U — CY|| 
approaches zero. This implies that both ||Z|| and ||F|| approach zero. This can also 
be formulated in terms of a quantity called the containment gap [7], which quanti- 
fies how close a smaller subspace is to being contained in a larger subspace. It is a 
generalization of the notion of the gap between two subspaces of equal dimension [36 . 

We must be able to monitor the residuals associated with the shifted systems 
as well as the other terms in (I5.7P which give the relationship between the base 
residual and shifted residual. This is important because we need to know when 
the norm of the error introduced by the approximate collinearity condition, namely, 



-o-EF(y$ 



At that 



is of the same order of magnitude as 

hk 

point, we can no longer guarantee that updating the shifted residual via the approxi- 
mate collinearity condition will improve the shifted system approximation. Therefore, 
we should no longer update the shifted system approximation. 

Our analysis gives us a way to monitor these quantities. Observe that given <r, 
U, and C, if we compute y„' according to (15.21) . then from (|5.8[) . we can compute 
the product EF ^y^'j . Thus, we can keep track of the vector s^^which allows us 

to cheaply construct r„ according to (|5.5[) . At the expense of one vector of storage, 
we can store and update , providing a cheap way to compute rf. 

The vector can be easily accumulated. At the beginning of Algorithm 15. 11 we 
compute an initial value of according to (|5.4p . At Line [5] of Algorithm 15.11 
we update s (cr) <- s (ct) - trUC*?^ according to (|5TT|) . At Line El we update 
s O) ^ S M _ o-EF(y m ) 1:fc according to ([S~5]) . 

It is worth noting that the approximate collinearity condition turns into an exact 
collinearity condition if we possess a different deflation subspace for the shifted system. 
Observe that if we write 

U (ct) =U-o-(A + ( tI)- 1 EF, 

then we obtain an exact Arnoldi-like relation 

(A + rf)vL ff) =W m+1 G&>, (5.9) 

where V$ = [U( CT ) V m _ fc ]. If we select 6 + n(V$) and enforce the 
collinearity condition r„' = l3 m r m , then we get that (|5.2|) is the exact collinearity 



equation which must be solved to obtain the collinear residual. Thus, the singularity of 
the linear system arising from the approximate collinearity condition in our algorithm 
is equivalent to the nonexistence of an exactly collinear residual when we construct the 
approximate solution to (jl.4l) over a different augmented Krylov subspace. Of course, 
is not available to us in practice; so, its utility here is only as a theoretical tool. 

6. Numerical Results. In the following numerical experiments, we constructed 
recycled subspaces from harmonic Ritz vectors of the coefficient matrix associated 
with the base system (|1.3[) with respect to the augmented subspace. This is by no 
means the only way to generate a recycled space. A strength of the Recycled GMRES 
method is the ability to use any recycled subspace. We based our choice on the 
report in [23] that the use of harmonic Ritz vectors in the recycled space has given 
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1 125 250 1 125 250 

System 1 System 2 




250 



System 3 System 4 

Fig. 6.1. The performance of Recycled GMRES (RGMRES) for shifted systems with recursion 
on the number of unconverged shifted systems. We performed simple tests on a sequence of four 
1000 X 1000 bidiagonal matrices. For this test, m = 100 and k = 50. The first matrix is the 
bidiagonal matrix used in [8]. The other systems are constructed by applying 0(1) bidiagonal random 
perturbations to the first system using the sprandO Matlab function. The four shifts are 10 — 2 , 
10 , 1, and 10. For the shifted systems, the residuals are only computed at the end of each 
cycle, with residual norms represented by the circle, triangle, square, and cross, respectively. The 
curves originating from these symbols are the convergence curves for the Recycled GMRES iterations 
executed for each shift system when that system becomes the base system during a recursive call to 
the method. 



good results. In the figures, for each recursive call to the algorithm, the solid black 
line represents the convergence curve for the base system, while the different markers 
indicate residual norms for the shifted systems at the end of each restart cycle. 

It should also be noted when an update to an approximation is calculated accord- 
ing to the approximate collinearity condition, we simply check if the residual norm 
for each shifted system to see if it has increased or decreased. If it increases, the 
update is not accepted and all future updates of that approximation are halted in the 
current cycle. This is based on the assumption that if the residual norm increased, 
our strategy is no longer effective for that particular shifted residual. This strategy 
can still be accomplished by accumulating s^, as we have already described. 

In all experiments, when solving the first system in the sequence, there is no 
initial recycled subspace. Thus, (|4.2[) holds at the end of each cycle. This means 
that solving the first system using Algorithm 15.11 is equivalent to applying shifted 
GMRES-DR for shifted systems [8] . This can be easily seen in the convergence plots 
as all residuals are reduced in norm below tolerance when solving the base system. 
There are no recursive calls to the algorithm. There is one exception; in Figure 16.41 
we intentionally chose nonpositive shifts. In the first convergence curve, we see that 
two recursive calls to the algorithm were necessary for these two shifts. 

Our first experiment, in Figure RTT1 illustrates the performance of the Recycled 



SUBSPACE RECYCLING FOR SHIFTED LINEAR SYSTEMS 



19 




1 125 250 375 

System 1 



1 125 250 375 

System 2 




1 125 250 375 

System 3 



1 125 250 375 

System 4 



Fig. 6.2. Convergence curves when we apply Recycled GMRES to each shifted bidiagonal system 
sequentially. As in the previous experiment, m = 100 and k = 50. 



GMRES method for shifted linear systems on a sequence of four bidiagonal matri- 
ces. The first matrix, Bi, used in [5], is a bidiagonal matrix with the numbers 
{.1, 1, 2, . . . , 998, 999} on the diagonal and ones on the first superdiagonal and the 
other matrices are random bidiagonal perturbations of of Bi, with the perturbations 
having the same bidiagonal structure and having Frobenius norm 1. We see that, as 
predicted by Corollarv l5.il the amount of residual reduction achieved for the shifted 
systems is effected by the size of the shift. For the shift u\ = 10~ 2 , the relative resid- 
ual is reduced to 0(1O~ 4 ) during the solution of the base system while for 04 = 10, the 
relative residual is only reduced to O(10 _1 ). This experiment is more for illustrative 
purposes than to demonstrate superior performance. After convergence for the base 
system, we take one of the shifted systems as our new base system and reapply the 
algorithm for the smaller family of systems. For comparison, we present in Figure [fT2"l 
the convergence curves if we simply apply Recycled GMRES to each shifted system 
sequentially. 

For our second and third experiments, we test two sequences of QCD matrices 
from the University of Florida sparse matrix collection [S]. In the second experiment, 
we work with six 3072 x 3072 sample matrices (called D x through D 6 ) with filename 
prefix conf 5 . 0-0014x4. We can construct the coefficient matrix Aj = I — K^D* 
where «W is a parameter associated to the QCD problem. For each matrix, there 
exists some critical value n c such that for < n w < Kc , Aj is a real-positive matrix. 
Equivalently, for each Aj, we can write Aj = -777 1 — Dj where — tW < -hj < 00, 
and we can scale any right-hand-side so that we are solving the same problem. For 
each Di, k^' is included with the matrix, and in these experiments, all are in the 
interval [0.20, 0.22]. Frequently in QCD computations, we wish to solve with multiple 
parameters, and we can solve all these systems simultaneously using an algorithm for 
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. Shifted RGMRES 
Repeated RGMRES 
2 3 4 5 

System Number 
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Shifted RGMRES 
Repeated RGMRES 
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Fig. 6.3. The performance of Recycled GMRES for shifted systems on a sequence of seven 
small Wilson fermion matrices where for each matrix, we solve a linear system with the base system 
and those associated to the shifts, .001 , .002, .003, —.6, and — .5. In the left figure, we illustrate 
the performance of RMGRES(100,50) for shifted systems as compared to repeated applications of 
RGMRES (100, 50). In the figure on the right, we compare performance for different size recycled 
subspaces with m = 100 fixed. 
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Fig. 6.4. Convergence curves from the same experiment as in Fiaure [b\3\ but only for the first six 
systems. We again use m = 100 and k = 50. Observe that two of the shifted systems (corresponding 
to the negative shifts) require more work than the others for each base system including the first 
system, in which we started with no recycled subspace. These are the two shifts for which shifted 
GMRES (or shifted GMRES-DR) would not converge. Notice that in this case, we are still able to 
converge by applying Recycled GMRES at the end. 



shifted linear systems. 
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Fig. 6.5. Convergence curves for another sequence of six Wilson fermion matrices, of size 
49152 X 49152 with m = 100 and k = 50. 



We chose {.001, .002, .003, —.6, —.5} as our family of shifts. Observe that by the 

(i) 

definition of A, and k c , the two shifted coefficient matrices associated with the 
two negative shifts are not real-positive. These are not physically relevant for QCD 
computations. We chose negative shifts merely to demonstrate the robustness of the 
algorithm. In this experiment, GMRES for shifted systems was unable to produce 
approximations for long sequences of iterations (due to singularity of the augmented 
collinearity matrix). Since the shifted GMRES method did not converge for some 
systems, its performance was not included in the figure. However, as we have noted, it 
is not difficult to modify this algorithm to gracefully handle this situation by applying 
restarted GMRES to any unconverged shifted systems at the end of the process. We 
compared with another strategy, repeated applications of Recycled GMRES [33] for 
the base and shifted system. In Figure 1631 we present the matrix- vector product counts 
for each system for a particular recycled subspace dimension as well as the totals over 
seven systems for various recycled subspace dimensions. We see that our method is 
able to produce a 20% reduction in the number of matrix-vector products needed to 
solve these systems, when compared to repeated applications of Recycled GMRES. 
In Figure [6~4l we present the convergence curves for the first six QCD matrices. 

In the third experiment, we worked with another sequence of QCD matrices from 
Matrix Market [T] with filename prefixes conf5.4 and conf6.0. These matrices of 

(i) 

of size 49152 x 49152. We used the critical n c to construct our system matrices as 
in the second experiment, and we choose the shifts {.001, .002, .003, .01, .02} as in the 
second experiment. In Figure HT5l we see the convergence of our algorithm for these 
systems. 
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Table 6.1 

A comparison, in terms of iteration counts of the shifted GMRES algorithm (SGMRES) with 
the Recycled GMRES algorithm for shifted systems (RGMRES) in terms of iteration count for 
different cycle lengths m. The results presented are the total iterations for solving a sequence of 
eleven QCD systems from ]20f . 



m 


k 


SGMRES (m) 


RGMRES (m - k,k) 


ratio 


25 


12 


4297 


3880 


0.90 


50 


25 


3284 


2980 


0.91 


75 


37 


3108 


2816 


0.91 


100 


50 


3028 


2697 


0.89 


125 


67 


3058 


2612 


0.85 


150 


75 


2958 


2546 


0.86 


175 


87 


2962 


2499 


0.84 


200 


100 


2947 


2458 


0.83 


225 


112 


2860 


2410 


0.84 



In the fourth experiment, we work with a sequence of eleven QCD matrices ob- 
tained from [20]. These matrices were delivered already shifted to be positive-real. 
As in the previous experiment, they are also of size 49152 x 49152. In Table lOTTl we il- 
lustrate the performance of the algorithm when the total dimension of the augmented 
space increases and compared this performance to an equivalent instance of the shifted 
GMRES algorithm. What this means is that we compared the performance of shifted 
GMRES with cycle length m versus our algorithm with an k = \m/2\ dimension de- 
flation space and m — k cycle length. In this experiment, there are two shifts, {.8, .81}. 
Here we see the potential benefits that our algorithm can yield as the deflation dimen- 
sion increases. We recognize that in some cases, such as in the last example, the gain 
as compared with RGMRES is relatively small, and that a larger number of shifts 
(or different values for the shifts) may not give this advantage. This follows from the 
fact that each shift incurs an additional recursive call to the algorithm and additional 
iterations. There is no such increase for shifted GMRES. Therefore, for sufficiently 
large number of shifts, shifted GMRES will have an advantage. Which method will 
perform better depends on several factors including the number of shifts (as we just 
mentioned), the magnitude of the shifts, the size of deflation space, and the deflation 
space selection technique. What we have shown is that for certain problems, the 
recycling strategy may be worth considering. 

7. Conclusions. We have shown that the attractive idea of efficiently combining 
techniques to simultaneously solve a family of shifted linear systems with subspace 
recycling techniques presents difficulties which cannot be easily surmounted. As an 
alternative, we present a technique that does as good as we can hope for using the 
subspace recycling framework. While this technique does not simultaneously solve all 
systems, it allows us to cheaply compute improvements to the approximate solution 
to the shifted systems while computing the minimum residual solution over an aug- 
mented Krylov subspace. When the base system has been solved, the method can be 
called recursively on the remaining systems, where one unconverged shifted system 
becomes the new base system. Numerical results demonstrates that the method can 
be effective. 
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for his extensive comments, including the suggestion that Algorithm 15.11 is amenable 
to recursion. 

Appendix A. Constructing Multiple Deflation Spaces. We now return to 
the question of constructing multiple deflation spaces. 

Given a matrix U G C" xfe with columns spanning our recycled space, how do we 
compute U and U^' such that both 

AU = C and (A + crI)U ((j) = C (A.l) 

hold, with C*C =^Ifc? In GCRODR, we compute AU = C. Then we compute the 
QR- factorization C = CR and let U = UR _1 . One way to interpret this process 
is to consider that to solve one system, we need one recycled space satisfying (|2.8p 
which costs k matrix-vector products plus one basis orthogonalization to acquire. 

Now suppose we have two coefficient matrices A and (A + erl). If we have U 
such that AU = C, there is no efficient way to acquire JJ^. We cannot compute the 
image of U under A and A + al separately because this would not yield one matrix C. 
To get a pair of matrices U and XJ^ satisfying (|A.1[) . we premultiply U by both A 
and A + crl. In other words, let C = A(A + aI)U and compute the QR-factorization 
C = CR. Then we have U = (A + (rl)Ur 1 and = AUR 1 . Since matrices 
differing by multiples of the identity commute, we have AU = (A + aI)XJ^ = C. 
This easily can be generalized to any number of shifts. Given a sequence of shifts 
{cr 1 }^ C C, where we consider the base matrix as a system with shift o~\ = 0, the 



Algorithm A.l: Computing a Family of Recycled Spaces for Multiple Shifted 
Linear Systems 



Input : A e C nxn , {a^+l C C, U e 
Output: C G C nxfc such that C*C = I fc , {U^}*** such that 
(A + ct 1 I)U( <t ') = C for alii 

1 for i = l to s + 1 do 

2 |_ U( ff *> = U 



3 for i = 1 to s do 



4 
5 
6 
7 
8 
9 
10 
11 



QR 



U( CT =+i) «- AU( (7 «+ 1 ) + CT l U(' Ts + 1 ) 
Compute the QR-factorization U^ CTs + 

U^b+i) <_ Q 

for j = 1 to s do 
U <- U^R- 1 

fci <- (j - 1 mod s + 1) + (1 - \^])(s + 1) 
k 2 <- (j + 1 mod s + 1) + Lf±jJ (s + 1) 
U(^) <- Uf°"*i) + -<T 3 -)U 



12 C = AU^ 1 ) + a s+1 V^^ 

13 Compute the QR-factorization C 

14 C <- Q 

is for i = l to s + 1 do 

|_ U( CT ') <- U^-'R- 1 



QR 
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Shift 



Fig. A.l. The performance of Alaorithm [A~l\ for a collection of shifts taken from the interval 
[l0 -8 ,10 8 ]. For this figure, both axes are on a logarithmic scale. We compute the largest principal 
angle betweenK(C) and 7£((A+(tI)U( ct > ) to examine the accuracy. Observe that there is an increase 
in accuracy for the right-most shift, a = 10 s . The subspace associated to a = 10 s corresponds to 
U( CT s+i) i n Algorithm IA.il The computation of JJ^s+i) j s done directly rather than by using 
Proposition IA.il 



matrices 



n (a+o-o 



UR 1 and C 



s+1 



H(A + a. t ) 



i=l 



UR 



satisfy our requirements with C = CR being the QR- factorization of C. Since matri- 
ces differing by a multiple of the identity commute, we have that (A + aiT)\j( ai > = C 
for all i. This strategy has clear stability issues which must be addressed. Further- 
more, we wish to execute as few calls to the operator A as possible. Therefore, we 
propose a more stable implementation of this procedure. 

Though these matrices do commute, once we choose a particular multiplica- 
tion order, e.g., if C = A(A + aI)UR _1 , we must proceed intelligently to recover 
Jjfa) — AUR 1 without doing additional matrix vector products. To deal with this 
issue, we first introduce a straightforward proposition whose proof follows by simple 
algebraic manipulation. 

Proposition A.l. Let U = (A + ctiI)U. Then (A + er 2 I)U = U + (a 2 - oi)U. 
Furthermore, let U = CR be the QR-factorization. Then 



(A + a 2 I)UR- 



(<7 2 -<7i)UR- 



For clarity, we present a small example to motivate a general algorithm for this 
process. Suppose we have a family of three shifted systems A + ail, A + o^Ij and 
A + ct 3 I. Our goal is, given U e C nxk , to generate \J^\ XJ (a2 \ e C nxk such 

that for some C G C™ xfc such that C*C = I fc , we have 



(A + criI)U (cri) = C and (A + cr 2 I)U (<T2) = C and (A + a 3 I)V^ 3) = C. 
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bo 



10 



7 = 10 4 



10" 



10 



-16 



2 x 10 5 10 5 
Shift 




9.9 x 10 4 1.1 x 10 J 10 J 



Shift 




1.1 x 10 6 10 5 
Shift 



Fig. A. 2. The performance of Alaorithm \A7l\ for collections of shifts. Different lines depict the 
algorithm's performance for different numbers of shifts. Line darkness increases when more shifts 
are used (meaning the algorithm requires more iterations). We compute the largest principal angle 
between K(C) and TZ((A + o-I)U( CT >) to examine the accuracy. In the figures in the left column, 
shifts come from the interval [10 — 7,10 + 7]. In the right column, shifts come from the interval 
[10 5 - 7, 10 5 + 7] . Obs erve that the Algorithm IA.il performs well for larger values of 7 when the 
center is located at 10 5 . 



Furthermore, we want to do this using only 3fc matrix- vector products (or three block 
k matrix-vector products) and fc-dimensional basis orthogonalizations. This is a three- 
step process, and at each step, we will compute k matrix- vector products (or one block 
k matrix-vector product) and then apply Proposition I A. 1 1 to implicitly compute other 
matrix-vector products needed in that step. To begin, let 

Tt(< j i) _ TT("2) _ Tt(°"3) _ fr 

We want to 
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1. Compute V[ ai} = (A + a 2 I)U { cri) , v[ a2} = (A + (7 3 I)U^ 3) , and 
V[ a3) = (A + CTl I)U^ 3) 

2. Compute the QR-factorization U^ 3) = U^ a) Ri 

3. Compute U< CTl) = U^Rf 1 and U^ 2) = U^Rf 1 . 

However, we do not need to do these computations explicitly. If we compute 
U^ 3) = (A + <T 1 I)V { a2) and the QR-factorization U^ 3) = U^Ri, then by Propo- 
sition IA.11 we know that 

U^O = u(aa) + ((T2 _ (Tl ) U (- 1 ) Rr l (A.2) 

which then implies that, again by Proposition I A. 1 1 

For the next step, explicitly, we want to 

1. Compute U^ CTl) = (A + ffaljU^ , U 2 CT2) = (A + (7iI)U^ 3) , and 



U^ 3) = (A + a 2 I)U^ 2) 



2. Compute the QR-factorization U 2 CT3) = U^ 3) R 2 

3. Compute U 2 CTl) = U^RJ 1 and U^ 2) = U^RJ 1 . 

Again, however, we do not need to do all these computations explicitly. If we compute 
V { 2 3} = (A + cr 2 I)U^ 2) and the QR-factorization U 2 CT3) = U 2 CT3) R 2 , then we have 

U^ CTl) = (A + <7 3 I)u[ ai) (A.3) 
= (A + a 3 I)(\J ( ° 3) + (a 2 - ^U^R^ 1 ) 

= (A + C7 2 I)U^ 3) + (a 3 - C7 2 )XJ[ CT3) + (o 2 - a ± )(A + ^U^R^ 
= U 2 CT3) + (a 3 - a 2 )\j[ a3) + (a 2 - <n)(A + atfV^B^ 1 . 

By Proposition lA.il 

(A + a 3 l)V^ ] R^ = + (a 3 - ^U^R^ 1 , 

yielding 

U 2 ffl) = U 2 ff3) + ((73 - a 2 )U^ 3) + (a 2 - cnXU^ + (a 3 - ^U^R^ 1 ) 

= U 2 a3) + (a 3 - <j 2 )V { ° 3) + (a 2 - tn)V^ a) + (<7 2 - ai)((7 3 - ^U^Rf 1 
= \J ( f 3) + (a 3 - a 2 )\][ a3) + (a 2 - a^U^ + (a 2 - o{){a z - a 1 )U^ l) Rr 1 
= U 2 CT3) + (a 3 - njU^ + (a 2 - (70(^3 - ^U^R^ 1 
= U 2 ff3) + (c7 3 - a!)(U^ 3) + (c 2 - C 7 1 )U^ l) Rr 1 ). 

Finally, by (|A.2|) . we have that 

=U^ 3) +( < 7 3 - C 7 1 )U^ ) 

which, when we post multiply by R^ 1 , gives us 

U 2 ai) = U 2 CT3) + (a 3 - (7 1 )UJ <71) R 2 - 1 . 
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A similar derivation shows us that 

l4 CT2) = V { 2 ai) + (ai - a 2 )U^ 2) R 2 - 1 . 

From here, we can compute the QR-factorization (A + <73l)U2^ = CR3 and let 

U(«ri) = u("i) R -i and n (-2) = ui ff2) R3 1 and U<°*> = U^Rg \ 

For the general case, in which we have s + 1 shifts, this suggests an 
(s + l)-step algorithm for the computation for all i and C such that C*C = Ifc. 

We begin with some notation. For any shift Oi 1 U^' is such that (A + aiT)U l -' Ji ' > = C. 
At the jth step, we compute k matrix-vector products and generate an intermediate 
matrix \]^^ for each shift, and we denote the initial recycled space associated to cr.;, 
Uq"^ = U. At step j, we compute the product 

Uf > = (A + ^I)u£i 

where pj = + 1 mod s + l)+ [jtj\ (s+1). This formula for pj allows us to cyclically 
index through all the shifts using arithmetic on the group Z s+ i ={l,2,...,s + l}. For 
stability, we need to maintain orthogonality of one of the bases. Therefore, without 
loss of generality, we compute the QR-factorization XJ^ CTs+1 ^ = U^ Ts+1 ^R :) and update 

= XJ^^HJ 1 for each s+1. In each step of this process, we only compute k 
matrix- vector products explicitly (those associated to <J s +i)- The others are computed 
using Proposition IA.11 At the end of the process, we have computed sk matrix- 
vector products (or s block k matrix- vector products). We finish by computing C = 
(A + cr s+ iI)Ui CTs+1 ^ and one final QR-factorization C = CR. We then update U^ CTi ' = 
ui CT ^R -1 for all i. We present these steps more precisely in Algorithm lA.il 

In Figure IA.11 we demonstrate the performance of Algorithm IA.1I for a set of 
shifts ranging from 10 -8 to 10 s . We see that for shifts which are 0(1) or smaller 
in magnitude, the algorithm performs quite well. However, for larger shifts, we see 
a dramatic loss of accuracy, where we define accuracy to mean the largest principal 
angle between 1Z((A + ctI)U^ ct ^) and 1Z(C). In the computation of these subspaces, 
we must choose one shift for which we will compute U( CT ) directly. Then we use 
Proposition IA.11 to implicitly do the other computations. We chose the rightmost 
shift to be the one for which we explicitly do the computations. Thus, we get a much 
smaller principal angles for the subspace associated to the last shift. It's not affected 
by any numerical errors are causing the algorithm's performance degradation for the 
other shifts. Further experiments show that it is not just the magnitude of the shifts 
that matter. Experimentally, we observe that three other criteria on the set of shifts 
also seem to affect the stability of the algorithm. For a collection of real shifts, let 
{<7i}* =1 C [a, b] with a < b where a and b are the largest and smallest such numbers 
for which the interval contains the shifts, respectively. We have observed that \b — a 
affects stability. However, this effect is relative to the distance of the set of shifts from 
the origin, where we define distance as \b + a\ /2, the distance of the midpoint from 
the origin. The further the distance of the set from the origin, the larger \b — a\ can be 
before instability onsets. This is demonstrated by experiments shown in Figure lA. 21 
What determines the performance of Algorithm [Al] is how closely clustered the shifts 
are, where "close" is defined relative to the distance of the midpoint from the origin. 
Again, we see the drop in principal angle magnitude for the rightmost shift. 
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Suppose that the initial recycled space U is an approximate invariant subspace of 
A associated to small-magnitude eigenvalues. The process described in Algorithm lA.il 
might yield a matrix C with columns spanning a new approximate invariant subspace. 
This new subspace might not be associated to small magnitude eigenvalues, though. 
Because C is constructed implicitly by premultiplying U by different members of the 
family of shifted matrices, it is not clear what eigenspace 1Z(C) might approximate. 
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